Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Less temperamental bwc artifacts check when between versions #7398

Closed

Conversation

peternied
Copy link
Member

Description

During the release process there are times when OpenSearch has branches such as 2.x that have increamented their build version number, but that number had not be backported to main. During this time bwc test can fail because the expected artifact version in main is lower than the version from the branch that is built.

This pull request allows for a margin of error between the expected artifacts with acceptable alternative artifacts which match either the next patch version, or the next minor version. This should prevent unrelated changes from being caught by the misfortunate time between version update backports.

Example output if alternative artifacts are found:

 [2.7.0] BUILD SUCCESSFUL in 3s
 [2.7.0] 170 actionable tasks: 3 executed, 167 up-to-date
Building 2.7.0 found acceptable alternative artifact opensearch-min-2.7.1-SNAPSHOT-linux-x64.tar.gz

Related Issues

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Changed the error message to include other files in the output directory
even if this projectArtifact wasn't found.  In this case it looks like
2.7.0 -> 2.7.1 version bump is causing the troubled, unrelated to this
  pull request.

Signed-off-by: Peter Nied <petern@amazon.com>
Signed-off-by: Peter Nied <petern@amazon.com>
Signed-off-by: Peter Nied <petern@amazon.com>
@reta
Copy link
Collaborator

reta commented May 3, 2023

Personally, I think it will hide the problem instead of manifesting it. Example right here: yesterday, for some reason, the autocut bot did not create pull request for main but only for 2.x and 2.7. We noticed that because build started to fail and the fix was sent shortly. With this change - the problem could stay unnoticed for very long, my 5 cents.

@github-actions
Copy link
Contributor

github-actions bot commented May 3, 2023

Gradle Check (Jenkins) Run Completed with:

@peternied
Copy link
Member Author

Yesterday, for some reason, the autocut bot did not create pull request for main but only for 2.x and 2.7. We noticed that because build started to fail and the fix was sent shortly.

That doesn't sound good, glad it was caught. There are 95 pull requests in flight, should we be blocking all those contributions because of that issue?

With this change - the problem could stay unnoticed for very long, my 5 cents.

These tests seems to be built to monitor specific aspects of the project. If we were to run them on a daily schedule where maintainers/interested contributors could monitor the results would that be a valuable alternative?

As a contributor to OpenSearch, I have to run and re-run the CI checks so often because of flaky tests or unrelated changes in flight, its discouraging.

Signed-off-by: Peter Nied <petern@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented May 3, 2023

Gradle Check (Jenkins) Run Completed with:

@peternied
Copy link
Member Author

Test failures are legit, they are associated with the version number mismatch during upgrade test. Depending on what others feel like I can update these or not. Details https://build.ci.opensearch.org/job/gradle-check/14820/testReport/

org.opensearch.upgrades.FullClusterRestartIT.testSnapshotRestore

Failing for the past 1 build (Since #14820 )
Took 8.7 sec.
Error Message
java.lang.AssertionError: expected:<[2.7.0]> but was:<[2.7.1]>

@reta
Copy link
Collaborator

reta commented May 3, 2023

That doesn't sound good, glad it was caught. There are 95 pull requests in flight, should we be blocking all those contributions because of that issue?

These tests seems to be built to monitor specific aspects of the project. If we were to run them on a daily schedule where maintainers/interested contributors could monitor the results would that be a valuable alternative?

The BWC tests are important, any change could basically break them. It sad it happens but it does not happen every day - fixing the automation (autocut bot) would help.

As a contributor to OpenSearch, I have to run and re-run the CI checks so often because of flaky tests or unrelated changes in flight, its discouraging.

We probably should not mix flaky (yes, it is discouraging) and BWC tests - this failure is stable. We release once so often, the pull requests which were green - stay green, no action required. The pull requests which are in work (and are failing checks) should be rebased / merged with target branch (that window would have been minimized if automation worked out well), this is the good practice anyway since upstream is a moving target.

@saratvemulapalli
Copy link
Member

+1 to @reta. I am afraid we'll kick the can down the road and it will silently wait until the next release.
This mechanism enforces quality while trading for more dev cycles which is fair in my opinion.

@dblock
Copy link
Member

dblock commented May 3, 2023

I think AUTOCUT bot isn't the problem, the build failed and it told us. Is there (there should be) a way to make version increment PRs in such an order that doesn't ever break bwc tests? For example, if we added 2.7.1 on main, first?

@reta
Copy link
Collaborator

reta commented May 3, 2023

I think AUTOCUT bot isn't the problem, the build failed and it told us.

Sorry, I haven't seen that, mind sharing the link? Because #7370 and #7371 were right on time.

Is there (there should be) a way to make version increment PRs in such an order that doesn't ever break bwc tests? For example, if we added 2.7.1 on main, first?

Not sure, I believe we build by checking out branches (2.7 / 2.x /... ), needs some research

@peternied
Copy link
Member Author

It looks like those change to increment the version follow the normal pull request process. I think we could make it faster. What if instead of waiting on human approval / CI results we immediately pushed the changes to all branches.

We have nearly all the automation to do this via GitHub action, how would y'all feel about changing the workflow to do it all at once?

@reta
Copy link
Collaborator

reta commented May 3, 2023

It looks like those change to increment the version follow the normal pull request process. I think we could make it faster. What if instead of waiting on human approval / CI results we immediately pushed the changes to all branches.

We have nearly all the automation to do this via GitHub action, how would y'all feel about changing the workflow to do it all at once?

I was also wondering if we could do something like "multi branch commit" (sadly not a thing in git), but in general, I think it may work - push to respective branch bypassing pull request phase.

@kotwanikunal
Copy link
Member

I think AUTOCUT bot isn't the problem, the build failed and it told us. Is there (there should be) a way to make version increment PRs in such an order that doesn't ever break bwc tests? For example, if we added 2.7.1 on main, first?

I think the bot has some issues since it did not automatically raise the PR for main, which IIRC it used to.

@peternied
Copy link
Member Author

peternied commented May 3, 2023

Adding bwc version 2.7.1 after 2.7.0
Adding V_2_7_1 after V_2_7_0
sed: can't read server/src/main/java/org/opensearch/Version.java: No such file or directory
Error: Process completed with exit code 2.

https://github.com/opensearch-project/OpenSearch/actions/runs/4865699230/jobs/8676359444

I'll start up another PR

peternied added a commit to peternied/OpenSearch-1 that referenced this pull request May 5, 2023
This change alters the version workflow to create a new issue, then
create several commits linked back to that issue and finally close the
issue when all commits are completed for tracability of version updates.

Background

As version updates have different propigation speeds depending on pull
request reviews, consistant CI in many branches, and an careful order of
execution this would cause failures outside the scope of changes that
would block changes into main and other branches.

This change removes the use of pull requests and the standard ci gates.
To ensure that the changes operate consistantly created a GitHub Action
to manage the file modification peternied/opensearch-core-version-updater

See an example of this workflows results with #66

Related issues
- Resolves opensearch-project#7411
- Related Discussion opensearch-project#7398
- Resolves opensearch-project#7396

Signed-off-by: Peter Nied <petern@amazon.com>
@peternied
Copy link
Member Author

I don't think it makes sense to move forward with this pull request as is. Reducing the temperament seems more likely to cause issues to linger instead of being addressed.

In place of this change, I've created another pull request to address the root cause which is the delay between updates in different branches.

@peternied peternied closed this May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] [Build Break] buildBwcLinuxTar is expecting 2.7.0 artifact but getting 2.7.1
5 participants